Optimal Bounds for Estimating Entropy with PMF Queries

نویسندگان

  • Cafer Caferov
  • Baris Kaya
  • Ryan O'Donnell
  • A. C. Cem Say
چکیده

Let p be an unknown probability distribution on [n] := {1, 2, . . . n} that we can access via two kinds of queries: A SAMP query takes no input and returns x ∈ [n] with probability p[x]; a PMF query takes as input x ∈ [n] and returns the value p[x]. We consider the task of estimating the entropy of p to within ±∆ (with high probability). For the usual Shannon entropy H(p), we show that Ω(log n/∆) queries are necessary, matching a recent upper bound of Canonne and Rubinfeld. For the Rényi entropy Hα(p), where α > 1, we show that Θ(n 1−1/α/2∆) queries are necessary and sufficient. This complements recent work of Acharya et al. in the SAMP-only model that showed O(n1−1/α) queries suffice when α is an integer, but Ω̃(n) queries are necessary when α is a noninteger. All of our lower bounds also easily extend to the model where CDF queries (given x, return ∑ y≤x p[y]) are allowed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Third Root Asymptotic Bounds in the Statistical Estimation of Thresholds

This paper is concerned with estimating the intersection point of two densities, given a sample of both of the densities. This problem arises in classification theory. The main results provide lower bounds for the probability of the estimation errors to be large on a scale determined by the inverse cube root of the sample size. As corollaries, we obtain probabilistic bounds for the prediction e...

متن کامل

Some properties of the parametric relative operator entropy

The notion of entropy was introduced by Clausius in 1850, and some of the main steps towards the consolidation of the concept were taken by Boltzmann and Gibbs. Since then several extensions and reformulations have been developed in various disciplines with motivations and applications in different subjects, such as statistical mechanics, information theory, and dynamical systems. Fujii and Kam...

متن کامل

Mathematical Methods for Supervised Learning

Let ρ be an unknown Borel measure defined on the space Z := X × Y with X ⊂ IR and Y = [−M,M ]. Given a set z ofm samples zi = (xi, yi) drawn according to ρ, the problem of estimating a regression function fρ using these samples is considered. The main focus is to understand what is the rate of approximation, measured either in expectation or probability, that can be obtained under a given prior...

متن کامل

Estimating ‎U‎pper and Lower Bounds For Industry Efficiency With Unknown ‎Technology‎

With a brief review of the studies on the industry in Data Envelopment Analysis (DEA) framework, the present paper proposes inner and outer technologies when only some basic information is available about the technology. Furthermore, applying Linear Programming techniques, it also determines lower and upper bounds for directional distance function (DDF) measure, overall and allocative efficienc...

متن کامل

ar X iv : 0 80 8 . 31 92 v 1 [ m at h . PR ] 2 3 A ug 2 00 8 ENTROPY AND CHAOS IN THE KAC MODEL

We investigate the behavior in N of the N–particle entropy functional for Kac’s stochastic model of Boltzmann dynamics, and its relation to the entropy function for solutions of Kac’s one dimensional nonlinear model Boltzmann equation. We prove a number of results that bring together the notion of propagation of chaos, which Kac introduced in the context of this model, with the problem of estim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015